Identification and analysis of transcription factor family-specific features derived from DNA and protein information
نویسندگان
چکیده
A common approach for understanding the relationship between transcription factors (TFs) and transcription factor binding sites (TFBSs) is to use features at either the TF level or the DNA level. For a given TF family, features can be derived from the DNA-binding domains at the protein level as well as TF binding sites at the DNA sequence level. Here we investigate the relative importance of features from these different levels for main TF families to better understand: (1) family-specific features and (2) the proportion of features from either the DNA or protein level. We perform class-wise feature selection on TF families to identify important features for each family. Importance of the selected features is assessed in terms of predictive accuracy of assigning TFs and associated TFBSs to correct TF families. Evaluation of the best model on an independent test set resulted in a predictive accuracy of 90%. Analysis of the selected features used in the best model on a family-by-family basis shows congruence with the fact that interaction between TF proteins and TFBS in the DNA is quite family specific. Our analysis further suggests that: (1) this approach can be used to determine and better understand which features (at both the DNA and protein levels) are important to consider for each TF family, and (2) a similar approach to combine DNA and protein level features may be useful for other datasets where protein–DNA interaction is a key component of biological function. 2009 Elsevier B.V. All rights reserved.
منابع مشابه
Identification, isolation and bioinformatics analysis of specific tuber promoter in plants
In this study, in order to find the suitable tuber promoter, an experiment was conducted in Shahid Beheshti University in 2018. For this purpose, promoter sequences of different tuberous plants were searched at NCBI. Sequences were multiple-aligned and the target primers designed from conserved regions. PCR analysis confirmed the presence of the desired promoter in plants of sweet potato a...
متن کاملIn silico identification of miRNAs and their target genes and analysis of gene co-expression network in saffron (Crocus sativus L.) stigma
As an aromatic and colorful plant of substantive taste, saffron (Crocus sativus L.) owes such properties of matter to growing class of the secondary metabolites derived from the carotenoids, apocarotenoids. Regarding the critical role of microRNAs in secondary metabolic synthesis and the limited number of identified miRNAs in C. sativus, on the other hand, one may see the point how the characte...
متن کاملIdentification of a Novel Splice Site Mutation in RUNX2 Gene in a Family with Rare Autosomal Dominant Cleidocranial Dysplasia
Introduction: Pathogenic variants of RUNX2, a gene that encodes an osteoblast-specific transcription factor, have been shown as the cause of CCD, which is a rare hereditary skeletal and dental disorder with dominant mode of inheritance and a broad range of clinical variability. Due to the relative lack of clinical complications resulting in CCD, the medical diagnosis of this disorder is challen...
متن کاملSequencing and phylogenetic study of APETALA1 homologous gene in garden cress (Lepidium sativum L.)
The flowering process in plants proceeds through the induction of an inflorescence meristem triggered by several pathways. Many of the genes associated with these pathways encode transcription factors of the MADS domain family. The MADS-domain transcription factor APETALA1 (AP1) is a key regulator of flower development. The first step to understand the molecular mechanisms under the function of...
متن کاملIDENTIFICATION OF MOLECULAR MARKERS LINKED TO LEAF CURL VIRUS DISEASE RESISTANCE IN COTTON
The identification of molecular markers linked to leaf curl virus (CLCuV) disease resistance in cotton has the potential to improve both the efficiency and the efficacy of selection in cotton breeding programs. Genetic analysis suggested that CLCuV resistance is controlled by a single dominant gene. In this study, an interspecific F2 population derived from a cross of Gossypium barbadense and G...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition Letters
دوره 31 شماره
صفحات -
تاریخ انتشار 2010